ReNoun: Fact Extraction for Nominal Attributes

نویسندگان

  • Mohamed Yahya
  • Steven Euijong Whang
  • Rahul Gupta
  • Alon Y. Halevy
چکیده

Search engines are increasingly relying on large knowledge bases of facts to provide direct answers to users’ queries. However, the construction of these knowledge bases is largely manual and does not scale to the long and heavy tail of facts. Open information extraction tries to address this challenge, but typically assumes that facts are expressed with verb phrases, and therefore has had difficulty extracting facts for noun-based relations. We describe ReNoun, an open information extraction system that complements previous efforts by focusing on nominal attributes and on the long tail. ReNoun’s approach is based on leveraging a large ontology of noun attributes mined from a text corpus and from user queries. ReNoun creates a seed set of training data by using specialized patterns and requiring that the facts mention an attribute in the ontology. ReNoun then generalizes from this seed set to produce a much larger set of extractions that are then scored. We describe experiments that show that we extract facts with high precision and for attributes that cannot be extracted with verb-based techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Structural, Transitive and Latent Models for Biographic Fact Extraction

This paper presents six novel approaches to biographic fact extraction that model structural, transitive and latent properties of biographical data. The ensemble of these proposed models substantially outperforms standard pattern-based biographic fact extraction methods and performance is further improved by modeling inter-attribute correlations and distributions over functions of attributes, a...

متن کامل

Boolean Reasoning for Feature Extraction Problems

We recall several applications of Boolean reasoning for feature extraction and we propose an approach based on Boolean reasoning for new feature extraction from data tables with symbolic (nominal, qualitative) attributes. New features are of the form a 2 V , where V Va and Va is the set of values of attribute a. We emphasize that Boolean reasoning is also a good framework for complexity analysi...

متن کامل

Towards a General Technique for Transformation of Nominal Features into Numeric Features in Supervised Learning

Almost all of the machine learning problems require data preprocessing. This stage is especially important for problems where the datasets contain features of mixed types (i.e. nominal and numeric). An often practice in such cases is to transform each nominal features into many dummy (i.e. binary) features. Also many classification algorithms have preference of numeric attributes over nominal a...

متن کامل

Ordinal Measurement for Decision Aid: a Conceptual Framework an Research Agenda

It happens more and more often in decision aiding situations to be faced with ordinal or nominal information concerning the alternatives that are considered by a client/decision maker (hereafter we will use the term decision maker DM). By the term ordinal or nominal information we intend the fact that evaluation on attributes, descriptors, indexes, criteria etc. may be expressed on ordinal or n...

متن کامل

Anonymization of nominal data based on semantic marginality

Nominal attributes are very common in data sets about individuals, specifically medical data like patient healthcare records. Attributes of this type tend to be sensitive due to their personal nature. If public-use data sets need to be released, e.g. for clinical research purposes, data should be first anonymized. However, since most anonymization methods omit data semantics when dealing with n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014